Introduction to Trip Planning Enhancement Analysis 🌟¶
In our quest to refine and elevate the trip planning experience, this is a thorough analysis of the platform's current challenges and opportunities. This document synthesizes my findings into actionable insights, aimed at addressing key errors, optimizing the trip builder, and setting the stage for strategic evolution in our product and technology offerings.
Importing packages¶
import pandas as pd
import warnings
from ydata_profiling import ProfileReport
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import scipy.stats as stats
import calendar
# import warnings
# warnings.filterwarnings('ignore')
# warnings.filterwarnings("ignore", category=DeprecationWarning)
Importing the dataframe¶
df = pd.read_csv('reporting-trip-request-extract.csv')
Importing functions¶
%run functions.ipynb
Total (Rows,columns)¶
df.shape
(5000, 35)
Renaming the columns for better readability¶
cols_to_rename = {'badproportionerrorcount':'badproportion_ERR',
'firstchoiceaccommodationunavailablecount':'firstaccommodation_ERR',
'noavailabledotwaccommodationerrorcount':'DOTW_ERR',
'nofamilymanualfallbackserrorcount':'manualfallback_ERR',
'nogoodscoreerrorcount':'nogoodscore_ERR',
'accommodationunavailableerrorcount':'accommodation_ERR',
'substituteaccommodationunavailablecount':'substaccommodation_ERR',
'timeouterrorcount':'timeout_ERR',
'tripbuildtimeseconds':'tripbuildtime',
'failtimeseconds':'failtime',
'aborteddataerrorcount':'aborteddata_ERR',
'failureindurationserrorcount':'failureinduration_ERR',
'norouteserrorcount':'noroutes_ERR',
'overnightreductionerrorcount':'overnightreduction_ERR',
'transportunavailableerrorcount':'transportunavailable_ERR'}
df = df.rename(columns=cols_to_rename)
for col in df.columns:
col1 = col.replace('createtripformsubmission_','')
df = df.rename(columns={col:col1})
Generating a Comprehensive Profile Report 📊¶
To gain a deeper understanding of our dataset and to ensure thorough analysis, we'll generate a comprehensive profile report. This report will include detailed statistics, distributions, and correlations for all columns within our dataset. It's an essential step for preliminary data exploration, helping us identify potential data quality issues, outliers, and patterns that could inform further analysis and modeling decisions.
profile_report = ProfileReport(df)
profile_report
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]